Applying Text Categorization to Vocabulary Enhancement for Japanese-English Cross-Language Retrieval

نویسندگان

  • Fredric Gey
  • Aitao Chen
  • Hailing Jiang
چکیده

In this paper we explore a new method for vocabulary enhancement in cross-language retrieval. The focus is on whether we can improve upon dictionary-based retrieval, machine translation of queries, or the use of a bilingual lexicon derived from parallel corpus alignment. All experiments are done with the NACSIS collection of Japanese scientific abstracts with titles and author-assigned keywords available also in English. Our results show that use of text categorization methods to mine documents for words associated with subject keywords will provide almost equivalent performance to direct vocabulary search, but not better than bilingual lexicon-based retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Translation by Text Categorization

We report on the development of a cross language information retrieval system, which translates user queries by categorizing these queries into terms listed in a controlled vocabulary. Unlike usual automatic text categorization systems, which rely on dataintensive models induced from large training data, our automatic text categorization tool applies data-independent classifiers: a vector-space...

متن کامل

Studying the Effect of Retrieval Direction during Reading on Productive and Receptive Knowledge of Vocabulary

Retrieval tasks provide learners with an opportunity to focus both on meaning and on form. There are four different retrieval directions. The present study aimed to identify the optimal direction of recall type retrievals during reading and to investigate the outcomes of each one. Forty-eight intermediate EFL learners took part in the study. One of the experimental groups was provided with the ...

متن کامل

Comparing Multiple Methods for Japanese and Japanese-English Text Retrieval

The NACSIS collection of Japanese scienti c documents (with English titles) provides a solid foundation for information retrieval research into 1) segmentation methods for Japanese text, 2) e ective methods for monolingual Japanese retrieval, and 3) JapaneseEnglish cross-language retrieval. This paper compares multiple methods for Japanese and Japanese-English text retrieval. Our focus is on ac...

متن کامل

Building Linked Open Data of Life Science Dictionary

There is a growing need for efficient and integrated access to databases provided by diverse institutions. Using a linked data design pattern allows the diverse data on the Internet to be linked effectively and accessed efficiently by computers. In addition, providing a dictionary to translate words into another language in Resource Description Framework (RDF) is useful to cross a language barr...

متن کامل

Query and Document Translation by Automatic Text Categorization: A Simple Approach to Establish a Strong Textual Baseline for ImageCLEFmed 2006

In this paper, we report on the fusion of simple retrieval strategies with thesaural resources in order to perform document and query translation for cross–language retrieval in a collection of medical cases. The collection contains textual and visual contents. In this paper, we focus on the textual contents of the collection, which contains documents in three languages: French, English and Ger...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999